On this page, we will delve into the specific data sources that we used.
We obtained both of our data sets from the Chicago Data Portal, which is a site created by the City of Chicago to house publicly available data. It utilizes the Socrata Open Data API and contains data sets from Beach Weather in 2016 to Criminal Activity from 2001 to the present. The first data set that we chose to use was the Chicago Public Schools - Progress Report Cards (2011-2012). It had a substantial amount of data from student performance in algebra to family involvement. In particular, this data set focused on elementary and middle schools. After choosing this as our focus, we decided to expand upon this by including another data set: Chicago Public Schools - High School Progress Report (2013-2014). This set includes less schools in total but includes more high schools than the 2011 data set, which allows us to expand our focus a bit. It also enables us to take time into consideration by comparing the high schools from 2011 to those in 2013 along with looking for changes between the middle schools in 2011 and high schools in 2013. To start, we began by using the requests library to obtain both data sets. Then, we dropped unnecessary columns, specifically those that were repeats of other columns such as x_coordinate in the 2011 dataset. Following this, we converted the values of columns into their requisite data types. For example, we converted the rate of misconducts per 100 students into a float because it represents a percentage value. Other values like the school ID and the school's name remained unchanged.